fix: LLM-based comment resolution + cap commit fetch#21
Conversation
Replace heuristic-based resolver (location/title matching against new_findings) with LLM verification that reads the current code and confirms whether each old finding was actually fixed. The old approach falsely marked findings as "Addressed" when they were simply not re-flagged (e.g., due to dedup filtering or formatting-only changes). Also cap get_pr_commits() at 50 most recent commits to avoid unbounded API calls and prompt inflation on long-lived PRs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| logger.debug("LLM resolution check failed for comment %d", comment["id"]) | ||
| return ResolutionVerdict(verdict="uncertain", reasoning="LLM check failed") | ||
|
|
||
|
|
||
| async def resolve_addressed_comments( | ||
| new_findings: list[Finding], | ||
| repo: str, | ||
| pr_number: int, | ||
| head_sha: str, | ||
| scm: SCMProvider, |
There was a problem hiding this comment.
🟡 Issue — worth fixing: LLM call per old comment, no token budget on context
Each old DiffFox comment triggers a separate LLM call with up to 61 lines of code context plus the comment body. With max_tokens=512 set for the response, the input token cost is unbounded — a large file with many old comments could send thousands of input tokens per call, multiplied by the number of unresolved comments per PR review.
Fix: Add a hard cap on the number of comments checked per PR (e.g., MAX_COMMENTS_TO_CHECK = 20) and log a warning when skipping. Also consider caching verdicts by (comment_id, head_sha) so re-runs of the same review don't re-spend tokens.
There was a problem hiding this comment.
✅ Addressed — this issue is no longer detected in the latest review.
There was a problem hiding this comment.
Fixed — added MAX_COMMENTS_TO_CHECK = 20 cap with a warning log when exceeded.
| # Run LLM verification concurrently (capped) | ||
| semaphore = asyncio.Semaphore(MAX_CONCURRENT_CHECKS) | ||
|
|
||
| async def check_with_limit(c: DiffFoxComment) -> tuple[DiffFoxComment, ResolutionVerdict]: |
There was a problem hiding this comment.
🟡 Issue — worth fixing: Already-resolved check uses pre-filtered candidates list
The 'already resolved' check (lines 143-146) is done before building candidates, so it correctly skips resolved comments. However, the all_reply_text check looks for 'Addressed' or 'Acknowledged' in bot replies — but the bot's new reply text uses em-dash (—) not a hyphen, and the check is purely substring-based. This is fine for the new text, but if a user happens to write 'Addressed' in their own reply, the comment will be permanently skipped even if the issue was never actually resolved by the bot.
Fix: Consider checking only bot-authored replies (e.g., filter all_replies by author) rather than all reply text when determining if a comment is already resolved.
There was a problem hiding this comment.
✅ Addressed — this issue is no longer detected in the latest review.
There was a problem hiding this comment.
Fixed — now matching exact bot reply prefix (✅ **Addressed** / ✅ **Acknowledged**) instead of bare substring matching.
src/diff_fox/scm/github.py
Outdated
| """Fetch commits for a pull request (capped at 50 most recent).""" | ||
| data = await self._get_paginated(f"/repos/{repo}/pulls/{pr_number}/commits") |
There was a problem hiding this comment.
🟡 Issue — worth fixing: Cap applied after full paginated fetch, not before
get_pr_commits() still fetches ALL pages via _get_paginated() and only slices to the last 50 after the fact. For PRs with hundreds of commits this still makes unbounded API calls and incurs the full network/rate-limit cost — the cap only reduces prompt inflation, not API call count.
Fix: Pass a per_page=50 parameter and fetch only the last page, or add an early-exit in _get_paginated once a page limit is reached. Alternatively, document that this only caps prompt size, not API calls.
There was a problem hiding this comment.
✅ Addressed — this issue is no longer detected in the latest review.
There was a problem hiding this comment.
Fixed — switched from _get_paginated() to a single _get() call with per_page=50, so only one API request is made.
- Cap resolution checks at 20 comments per PR to bound LLM token cost - Match exact bot reply prefix for already-resolved check instead of bare substring matching (prevents false skip if user writes "Addressed") - Use single-page API call (per_page=50) instead of full paginated fetch to avoid unbounded API calls on PRs with hundreds of commits Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ DiffFox — No Issues FoundReviewed PR #21 in DiffFox |
Summary
new_findings) with LLM verification that reads the actual current code and confirms whether each old finding was fixedget_pr_commits()to 50 most recent commits to prevent unbounded API calls and prompt inflation.gitignore: Exclude__pycache__/,.pyc, build artifactsProblem
PR #20 exposed a flaw: DiffFox flagged "unbounded paginated commit fetch" (valid finding), but a formatting-only follow-up push caused the resolver to mark it "Addressed" because:
new_findings(by design)new_findings→ assumed it was fixedHow it works now
For each old DiffFox comment:
head_shaverdict: "fixed"with reasoningTest plan
🤖 Generated with Claude Code